Overview

Brought to you by YData

Dataset statistics

Number of variables15
Number of observations72458
Missing cells32260
Missing cells (%)3.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.8 MiB
Average record size in memory113.0 B

Variable types

Numeric7
Text2
Categorical3
Boolean3

Alerts

is_employed is highly imbalanced (71.7%) Imbalance
health_ins is highly imbalanced (54.6%) Imbalance
is_employed has 25515 (35.2%) missing values Missing
housing_type has 1686 (2.3%) missing values Missing
num_vehicles has 1686 (2.3%) missing values Missing
gas_usage has 1686 (2.3%) missing values Missing
recent_move_b has 1687 (2.3%) missing values Missing
Unnamed: 0 has unique values Unique
custid has unique values Unique
income has 6691 (9.2%) zeros Zeros
num_vehicles has 4636 (6.4%) zeros Zeros

Reproduction

Analysis started2024-10-18 11:09:01.675022
Analysis finished2024-10-18 11:09:12.721890
Duration11.05 seconds
Software versionydata-profiling vv4.11.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

Unique 

Distinct72458
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49910.638
Minimum7
Maximum100000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:12.857986image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile5122.7
Q124911.25
median49838
Q374786.75
95-th percentile95058.3
Maximum100000
Range99993
Interquartile range (IQR)49875.5

Descriptive statistics

Standard deviation28772.083
Coefficient of variation (CV)0.57647195
Kurtosis-1.1937678
Mean49910.638
Median Absolute Deviation (MAD)24938
Skewness0.0066722823
Sum3.616425 × 109
Variance8.2783274 × 108
MonotonicityStrictly increasing
2024-10-18T12:09:13.046636image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7 1
 
< 0.1%
66447 1
 
< 0.1%
66456 1
 
< 0.1%
66455 1
 
< 0.1%
66454 1
 
< 0.1%
66453 1
 
< 0.1%
66450 1
 
< 0.1%
66449 1
 
< 0.1%
66446 1
 
< 0.1%
66458 1
 
< 0.1%
Other values (72448) 72448
> 99.9%
ValueCountFrequency (%)
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
11 1
< 0.1%
15 1
< 0.1%
17 1
< 0.1%
19 1
< 0.1%
20 1
< 0.1%
21 1
< 0.1%
ValueCountFrequency (%)
100000 1
< 0.1%
99999 1
< 0.1%
99998 1
< 0.1%
99997 1
< 0.1%
99996 1
< 0.1%
99995 1
< 0.1%
99994 1
< 0.1%
99993 1
< 0.1%
99991 1
< 0.1%
99990 1
< 0.1%

custid
Text

Unique 

Distinct72458
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:13.423375image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters869496
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72458 ?
Unique (%)100.0%

Sample

1st row000006646_03
2nd row000007827_01
3rd row000008359_04
4th row000008529_01
5th row000008744_02
ValueCountFrequency (%)
000006646_03 1
 
< 0.1%
000026926_01 1
 
< 0.1%
000015018_01 1
 
< 0.1%
000017314_02 1
 
< 0.1%
000054759_02 1
 
< 0.1%
000017383_04 1
 
< 0.1%
000019351_01 1
 
< 0.1%
000019351_02 1
 
< 0.1%
000028817_02 1
 
< 0.1%
000008744_02 1
 
< 0.1%
Other values (72448) 72448
> 99.9%
2024-10-18T12:09:13.948937image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 311165
35.8%
1 109450
 
12.6%
_ 72458
 
8.3%
2 69823
 
8.0%
3 51814
 
6.0%
4 47832
 
5.5%
5 43051
 
5.0%
6 41249
 
4.7%
7 40992
 
4.7%
9 40896
 
4.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 869496
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 311165
35.8%
1 109450
 
12.6%
_ 72458
 
8.3%
2 69823
 
8.0%
3 51814
 
6.0%
4 47832
 
5.5%
5 43051
 
5.0%
6 41249
 
4.7%
7 40992
 
4.7%
9 40896
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 869496
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 311165
35.8%
1 109450
 
12.6%
_ 72458
 
8.3%
2 69823
 
8.0%
3 51814
 
6.0%
4 47832
 
5.5%
5 43051
 
5.0%
6 41249
 
4.7%
7 40992
 
4.7%
9 40896
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 869496
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 311165
35.8%
1 109450
 
12.6%
_ 72458
 
8.3%
2 69823
 
8.0%
3 51814
 
6.0%
4 47832
 
5.5%
5 43051
 
5.0%
6 41249
 
4.7%
7 40992
 
4.7%
9 40896
 
4.7%

sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size566.2 KiB
Female
37461 
Male
34997 

Length

Max length6
Median length6
Mean length5.0340059
Min length4

Characters and Unicode

Total characters364754
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowFemale
3rd rowFemale
4th rowFemale
5th rowMale

Common Values

ValueCountFrequency (%)
Female 37461
51.7%
Male 34997
48.3%

Length

2024-10-18T12:09:14.153290image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-18T12:09:14.312905image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
female 37461
51.7%
male 34997
48.3%

Most occurring characters

ValueCountFrequency (%)
e 109919
30.1%
a 72458
19.9%
l 72458
19.9%
F 37461
 
10.3%
m 37461
 
10.3%
M 34997
 
9.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 364754
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 109919
30.1%
a 72458
19.9%
l 72458
19.9%
F 37461
 
10.3%
m 37461
 
10.3%
M 34997
 
9.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 364754
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 109919
30.1%
a 72458
19.9%
l 72458
19.9%
F 37461
 
10.3%
m 37461
 
10.3%
M 34997
 
9.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 364754
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 109919
30.1%
a 72458
19.9%
l 72458
19.9%
F 37461
 
10.3%
m 37461
 
10.3%
M 34997
 
9.6%

is_employed
Boolean

Imbalance  Missing 

Distinct2
Distinct (%)< 0.1%
Missing25515
Missing (%)35.2%
Memory size566.2 KiB
True
44630 
False
 
2313
(Missing)
25515 
ValueCountFrequency (%)
True 44630
61.6%
False 2313
 
3.2%
(Missing) 25515
35.2%
2024-10-18T12:09:14.422731image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

income
Real number (ℝ)

Zeros 

Distinct4445
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41881.435
Minimum-6900
Maximum1257000
Zeros6691
Zeros (%)9.2%
Negative45
Negative (%)0.1%
Memory size566.2 KiB
2024-10-18T12:09:14.564639image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum-6900
5-th percentile0
Q110700
median26400
Q352000
95-th percentile125000
Maximum1257000
Range1263900
Interquartile range (IQR)41300

Descriptive statistics

Standard deviation58274.605
Coefficient of variation (CV)1.3914185
Kurtosis37.944025
Mean41881.435
Median Absolute Deviation (MAD)18400
Skewness4.87276
Sum3.034645 × 109
Variance3.3959296 × 109
MonotonicityNot monotonic
2024-10-18T12:09:14.758870image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 6691
 
9.2%
30000 1650
 
2.3%
20000 1394
 
1.9%
40000 1390
 
1.9%
50000 1357
 
1.9%
12000 1126
 
1.6%
25000 1094
 
1.5%
60000 1053
 
1.5%
35000 939
 
1.3%
15000 840
 
1.2%
Other values (4435) 54924
75.8%
ValueCountFrequency (%)
-6900 1
 
< 0.1%
-6800 2
 
< 0.1%
-6700 2
 
< 0.1%
-6600 1
 
< 0.1%
-6100 1
 
< 0.1%
-6000 5
< 0.1%
-5900 1
 
< 0.1%
-5800 1
 
< 0.1%
-5700 1
 
< 0.1%
-5500 2
 
< 0.1%
ValueCountFrequency (%)
1257000 1
 
< 0.1%
1051000 2
< 0.1%
997000 1
 
< 0.1%
897100 1
 
< 0.1%
868200 1
 
< 0.1%
861000 1
 
< 0.1%
859000 1
 
< 0.1%
812000 3
< 0.1%
787000 1
 
< 0.1%
766000 1
 
< 0.1%

marital_status
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size566.2 KiB
Married
38040 
Never married
19120 
Divorced/Separated
10572 
Widowed
4726 

Length

Max length18
Median length7
Mean length10.188219
Min length7

Characters and Unicode

Total characters738218
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNever married
2nd rowDivorced/Separated
3rd rowNever married
4th rowWidowed
5th rowDivorced/Separated

Common Values

ValueCountFrequency (%)
Married 38040
52.5%
Never married 19120
26.4%
Divorced/Separated 10572
 
14.6%
Widowed 4726
 
6.5%

Length

2024-10-18T12:09:14.933734image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-18T12:09:15.069988image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
married 57160
62.4%
never 19120
 
20.9%
divorced/separated 10572
 
11.5%
widowed 4726
 
5.2%

Most occurring characters

ValueCountFrequency (%)
r 154584
20.9%
e 131842
17.9%
d 87756
11.9%
a 78304
10.6%
i 72458
9.8%
M 38040
 
5.2%
v 29692
 
4.0%
19120
 
2.6%
m 19120
 
2.6%
N 19120
 
2.6%
Other values (9) 88182
11.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 738218
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 154584
20.9%
e 131842
17.9%
d 87756
11.9%
a 78304
10.6%
i 72458
9.8%
M 38040
 
5.2%
v 29692
 
4.0%
19120
 
2.6%
m 19120
 
2.6%
N 19120
 
2.6%
Other values (9) 88182
11.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 738218
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 154584
20.9%
e 131842
17.9%
d 87756
11.9%
a 78304
10.6%
i 72458
9.8%
M 38040
 
5.2%
v 29692
 
4.0%
19120
 
2.6%
m 19120
 
2.6%
N 19120
 
2.6%
Other values (9) 88182
11.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 738218
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 154584
20.9%
e 131842
17.9%
d 87756
11.9%
a 78304
10.6%
i 72458
9.8%
M 38040
 
5.2%
v 29692
 
4.0%
19120
 
2.6%
m 19120
 
2.6%
N 19120
 
2.6%
Other values (9) 88182
11.9%

health_ins
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size70.9 KiB
True
65553 
False
6905 
ValueCountFrequency (%)
True 65553
90.5%
False 6905
 
9.5%
2024-10-18T12:09:15.209177image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

housing_type
Categorical

Missing 

Distinct4
Distinct (%)< 0.1%
Missing1686
Missing (%)2.3%
Memory size566.2 KiB
Homeowner with mortgage/loan
31092 
Rented
21956 
Homeowner free and clear
16604 
Occupied with no rent
 
1120

Length

Max length28
Median length24
Mean length20.125586
Min length6

Characters and Unicode

Total characters1424328
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHomeowner free and clear
2nd rowRented
3rd rowHomeowner with mortgage/loan
4th rowHomeowner free and clear
5th rowRented

Common Values

ValueCountFrequency (%)
Homeowner with mortgage/loan 31092
42.9%
Rented 21956
30.3%
Homeowner free and clear 16604
22.9%
Occupied with no rent 1120
 
1.5%
(Missing) 1686
 
2.3%

Length

2024-10-18T12:09:15.343622image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-18T12:09:15.479347image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
homeowner 47696
25.6%
with 32212
17.3%
mortgage/loan 31092
16.7%
rented 21956
11.8%
free 16604
 
8.9%
and 16604
 
8.9%
clear 16604
 
8.9%
occupied 1120
 
0.6%
no 1120
 
0.6%
rent 1120
 
0.6%

Most occurring characters

ValueCountFrequency (%)
e 222448
15.6%
o 158696
11.1%
n 119588
8.4%
115356
 
8.1%
r 113116
 
7.9%
a 95392
 
6.7%
t 86380
 
6.1%
w 79908
 
5.6%
m 78788
 
5.5%
g 62184
 
4.4%
Other values (12) 292472
20.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1424328
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 222448
15.6%
o 158696
11.1%
n 119588
8.4%
115356
 
8.1%
r 113116
 
7.9%
a 95392
 
6.7%
t 86380
 
6.1%
w 79908
 
5.6%
m 78788
 
5.5%
g 62184
 
4.4%
Other values (12) 292472
20.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1424328
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 222448
15.6%
o 158696
11.1%
n 119588
8.4%
115356
 
8.1%
r 113116
 
7.9%
a 95392
 
6.7%
t 86380
 
6.1%
w 79908
 
5.6%
m 78788
 
5.5%
g 62184
 
4.4%
Other values (12) 292472
20.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1424328
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 222448
15.6%
o 158696
11.1%
n 119588
8.4%
115356
 
8.1%
r 113116
 
7.9%
a 95392
 
6.7%
t 86380
 
6.1%
w 79908
 
5.6%
m 78788
 
5.5%
g 62184
 
4.4%
Other values (12) 292472
20.5%

num_vehicles
Real number (ℝ)

Missing  Zeros 

Distinct7
Distinct (%)< 0.1%
Missing1686
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean2.0668202
Minimum0
Maximum6
Zeros4636
Zeros (%)6.4%
Negative0
Negative (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:15.616704image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.170076
Coefficient of variation (CV)0.56612374
Kurtosis0.80250553
Mean2.0668202
Median Absolute Deviation (MAD)1
Skewness0.67181187
Sum146273
Variance1.3690778
MonotonicityNot monotonic
2024-10-18T12:09:15.745826image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2 28052
38.7%
1 17445
24.1%
3 13094
18.1%
4 5100
 
7.0%
0 4636
 
6.4%
5 1628
 
2.2%
6 817
 
1.1%
(Missing) 1686
 
2.3%
ValueCountFrequency (%)
0 4636
 
6.4%
1 17445
24.1%
2 28052
38.7%
3 13094
18.1%
4 5100
 
7.0%
5 1628
 
2.2%
6 817
 
1.1%
ValueCountFrequency (%)
6 817
 
1.1%
5 1628
 
2.2%
4 5100
 
7.0%
3 13094
18.1%
2 28052
38.7%
1 17445
24.1%
0 4636
 
6.4%

age
Real number (ℝ)

Distinct81
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.208893
Minimum0
Maximum120
Zeros77
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:15.907319image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile23
Q134
median48
Q362
95-th percentile80
Maximum120
Range120
Interquartile range (IQR)28

Descriptive statistics

Standard deviation18.090035
Coefficient of variation (CV)0.36761718
Kurtosis-0.39362047
Mean49.208893
Median Absolute Deviation (MAD)14
Skewness0.37562027
Sum3565578
Variance327.24935
MonotonicityNot monotonic
2024-10-18T12:09:16.093732image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26 1462
 
2.0%
45 1433
 
2.0%
30 1420
 
2.0%
54 1414
 
2.0%
25 1404
 
1.9%
27 1394
 
1.9%
53 1381
 
1.9%
56 1379
 
1.9%
46 1368
 
1.9%
21 1350
 
1.9%
Other values (71) 58453
80.7%
ValueCountFrequency (%)
0 77
 
0.1%
21 1350
1.9%
22 1299
1.8%
23 1324
1.8%
24 1293
1.8%
25 1404
1.9%
26 1462
2.0%
27 1394
1.9%
28 1333
1.8%
29 1265
1.7%
ValueCountFrequency (%)
120 66
 
0.1%
114 60
 
0.1%
110 65
 
0.1%
100 71
 
0.1%
96 6
 
< 0.1%
95 90
 
0.1%
94 275
0.4%
93 123
0.2%
92 83
 
0.1%
91 57
 
0.1%
Distinct51
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:16.332001image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length20
Median length13
Mean length8.4383781
Min length4

Characters and Unicode

Total characters611428
Distinct characters46
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlabama
2nd rowAlabama
3rd rowAlabama
4th rowAlabama
5th rowAlabama
ValueCountFrequency (%)
california 8870
 
10.5%
new 7114
 
8.4%
texas 5938
 
7.0%
florida 4921
 
5.8%
york 4375
 
5.2%
carolina 3496
 
4.1%
pennsylvania 2968
 
3.5%
illinois 2896
 
3.4%
ohio 2587
 
3.1%
north 2498
 
3.0%
Other values (45) 38730
45.9%
2024-10-18T12:09:16.756056image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 79795
13.1%
i 70388
 
11.5%
n 52941
 
8.7%
o 50637
 
8.3%
s 39554
 
6.5%
r 39013
 
6.4%
e 37030
 
6.1%
l 31315
 
5.1%
t 14701
 
2.4%
C 14635
 
2.4%
Other values (36) 181419
29.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 611428
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 79795
13.1%
i 70388
 
11.5%
n 52941
 
8.7%
o 50637
 
8.3%
s 39554
 
6.5%
r 39013
 
6.4%
e 37030
 
6.1%
l 31315
 
5.1%
t 14701
 
2.4%
C 14635
 
2.4%
Other values (36) 181419
29.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 611428
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 79795
13.1%
i 70388
 
11.5%
n 52941
 
8.7%
o 50637
 
8.3%
s 39554
 
6.5%
r 39013
 
6.4%
e 37030
 
6.1%
l 31315
 
5.1%
t 14701
 
2.4%
C 14635
 
2.4%
Other values (36) 181419
29.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 611428
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 79795
13.1%
i 70388
 
11.5%
n 52941
 
8.7%
o 50637
 
8.3%
s 39554
 
6.5%
r 39013
 
6.4%
e 37030
 
6.1%
l 31315
 
5.1%
t 14701
 
2.4%
C 14635
 
2.4%
Other values (36) 181419
29.7%

code_column
Real number (ℝ)

Distinct49
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3285.5236
Minimum131
Maximum8962
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:16.944772image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum131
5-th percentile407
Q11305
median2269
Q34979
95-th percentile8962
Maximum8962
Range8831
Interquartile range (IQR)3674

Descriptive statistics

Standard deviation2661.7752
Coefficient of variation (CV)0.81015253
Kurtosis-0.11789355
Mean3285.5236
Median Absolute Deviation (MAD)1222
Skewness1.0290568
Sum2.3806247 × 108
Variance7085047.4
MonotonicityNot monotonic
2024-10-18T12:09:17.115941image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
8962 8870
 
12.2%
6026 5938
 
8.2%
4979 4921
 
6.8%
4431 4375
 
6.0%
2997 2968
 
4.1%
2925 2896
 
4.0%
2614 2587
 
3.6%
2357 2329
 
3.2%
2269 2246
 
3.1%
2198 2177
 
3.0%
Other values (39) 33151
45.8%
ValueCountFrequency (%)
131 130
 
0.2%
146 146
0.2%
162 160
0.2%
170 337
0.5%
188 186
0.3%
204 198
0.3%
218 216
0.3%
220 218
0.3%
307 305
0.4%
325 319
0.4%
ValueCountFrequency (%)
8962 8870
12.2%
6026 5938
8.2%
4979 4921
6.8%
4431 4375
6.0%
2997 2968
 
4.1%
2925 2896
 
4.0%
2614 2587
 
3.6%
2357 2329
 
3.2%
2269 2246
 
3.1%
2198 2177
 
3.0%

gas_usage
Real number (ℝ)

Missing 

Distinct57
Distinct (%)0.1%
Missing1686
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean41.230501
Minimum1
Maximum570
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:17.282648image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median10
Q360
95-th percentile160
Maximum570
Range569
Interquartile range (IQR)57

Descriptive statistics

Standard deviation63.149323
Coefficient of variation (CV)1.5316167
Kurtosis13.033371
Mean41.230501
Median Absolute Deviation (MAD)9
Skewness3.0309741
Sum2917965
Variance3987.837
MonotonicityNot monotonic
2024-10-18T12:09:17.460431image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 24689
34.1%
2 6534
 
9.0%
30 5100
 
7.0%
40 4199
 
5.8%
20 4118
 
5.7%
50 4069
 
5.6%
100 2623
 
3.6%
60 2563
 
3.5%
1 2368
 
3.3%
80 2308
 
3.2%
Other values (47) 12201
16.8%
ValueCountFrequency (%)
1 2368
 
3.3%
2 6534
 
9.0%
3 24689
34.1%
4 448
 
0.6%
10 1361
 
1.9%
20 4118
 
5.7%
30 5100
 
7.0%
40 4199
 
5.8%
50 4069
 
5.6%
60 2563
 
3.5%
ValueCountFrequency (%)
570 11
 
< 0.1%
540 9
 
< 0.1%
520 3
 
< 0.1%
510 3
 
< 0.1%
490 35
< 0.1%
480 72
0.1%
470 39
0.1%
460 42
0.1%
450 48
0.1%
440 3
 
< 0.1%

rooms
Real number (ℝ)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4945486
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size566.2 KiB
2024-10-18T12:09:17.609167image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.7065374
Coefficient of variation (CV)0.48834274
Kurtosis-1.2689358
Mean3.4945486
Median Absolute Deviation (MAD)1
Skewness0.0064184753
Sum253208
Variance2.91227
MonotonicityNot monotonic
2024-10-18T12:09:17.745129image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2 12230
16.9%
3 12134
16.7%
5 12098
16.7%
1 12042
16.6%
6 11999
16.6%
4 11955
16.5%
ValueCountFrequency (%)
1 12042
16.6%
2 12230
16.9%
3 12134
16.7%
4 11955
16.5%
5 12098
16.7%
6 11999
16.6%
ValueCountFrequency (%)
6 11999
16.6%
5 12098
16.7%
4 11955
16.5%
3 12134
16.7%
2 12230
16.9%
1 12042
16.6%

recent_move_b
Boolean

Missing 

Distinct2
Distinct (%)< 0.1%
Missing1687
Missing (%)2.3%
Memory size141.6 KiB
False
61773 
True
8998 
(Missing)
 
1687
ValueCountFrequency (%)
False 61773
85.3%
True 8998
 
12.4%
(Missing) 1687
 
2.3%
2024-10-18T12:09:17.875144image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Interactions

2024-10-18T12:09:10.626708image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:05.092587image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:06.012455image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.000940image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.942712image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.847177image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.726724image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:10.754235image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:05.232528image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:06.216954image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.129335image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.072587image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.972879image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.857082image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:10.886334image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:05.371814image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:06.350916image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.266267image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.207633image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.102661image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.989754image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:11.015088image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:05.504569image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:06.484412image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.427411image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.341337image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.230936image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:10.123082image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:11.147278image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:05.638308image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:06.617718image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.563330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.469554image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.359451image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:10.252762image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:11.273019image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:05.760769image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:06.742353image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.687319image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.593840image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.479478image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:10.377372image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:11.402742image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:05.886151image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:06.872661image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:07.817078image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:08.721329image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:09.602008image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-18T12:09:10.501666image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2024-10-18T12:09:17.978602image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Unnamed: 0agecode_columngas_usagehealth_inshousing_typeincomeis_employedmarital_statusnum_vehiclesrecent_move_broomssex
Unnamed: 01.000-0.002-0.1610.0260.0960.0560.0130.0180.026-0.0140.026-0.0010.000
age-0.0021.000-0.0200.0560.1780.2260.0670.0730.401-0.1150.231-0.0020.058
code_column-0.161-0.0201.000-0.0630.1000.070-0.0140.0210.033-0.0180.040-0.0060.000
gas_usage0.0260.056-0.0631.0000.0420.0950.0410.0000.0370.1390.0670.0030.009
health_ins0.0960.1780.1000.0421.0000.1550.0660.1110.1470.0570.0590.0010.060
housing_type0.0560.2260.0700.0950.1551.0000.0660.0650.1790.2200.3120.0000.013
income0.0130.067-0.0140.0410.0660.0661.0000.0570.0700.1050.0190.0000.128
is_employed0.0180.0730.0210.0000.1110.0650.0571.0000.0960.0860.0230.0000.000
marital_status0.0260.4010.0330.0370.1470.1790.0700.0961.0000.2110.1250.0030.160
num_vehicles-0.014-0.115-0.0180.1390.0570.2200.1050.0860.2111.0000.123-0.0010.072
recent_move_b0.0260.2310.0400.0670.0590.3120.0190.0230.1250.1231.0000.0000.000
rooms-0.001-0.002-0.0060.0030.0010.0000.0000.0000.003-0.0010.0001.0000.000
sex0.0000.0580.0000.0090.0600.0130.1280.0000.1600.0720.0000.0001.000

Missing values

2024-10-18T12:09:11.650058image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-10-18T12:09:12.048007image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-10-18T12:09:12.564341image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0custidsexis_employedincomemarital_statushealth_inshousing_typenum_vehiclesagestate_of_rescode_columngas_usageroomsrecent_move_b
07000006646_03MaleTrue22000.0Never marriedTrueHomeowner free and clear0.024Alabama1047210.03F
18000007827_01FemaleNaN23200.0Divorced/SeparatedTrueRented0.082Alabama10473.06T
29000008359_04FemaleTrue21000.0Never marriedTrueHomeowner with mortgage/loan2.031Alabama104740.03F
310000008529_01FemaleNaN37770.0WidowedTrueHomeowner free and clear1.093Alabama1047120.02F
411000008744_02MaleTrue39000.0Divorced/SeparatedTrueRented2.067Alabama10473.02F
515000011466_01MaleNaN11100.0MarriedTrueHomeowner free and clear2.076Alabama1047200.06F
617000015018_01FemaleTrue25800.0MarriedFalseRented2.026Alabama10473.03F
719000017314_02FemaleNaN34600.0MarriedTrueHomeowner free and clear2.073Alabama104750.05F
820000017383_04FemaleTrue25000.0Never marriedTrueHomeowner free and clear5.027Alabama10473.04F
921000017554_02MaleTrue31200.0MarriedTrueHomeowner with mortgage/loan3.054Alabama104720.06F
Unnamed: 0custidsexis_employedincomemarital_statushealth_inshousing_typenum_vehiclesagestate_of_rescode_columngas_usageroomsrecent_move_b
7244899990001448933_01MaleTrue85000.0MarriedFalseHomeowner with mortgage/loan2.030Wyoming13130.04F
7244999991001458068_02FemaleTrue13000.0MarriedTrueHomeowner with mortgage/loan3.047Wyoming13150.02F
7245099993001493692_02FemaleTrue7200.0Never marriedTrueHomeowner with mortgage/loan3.033Wyoming13130.04F
7245199994001494186_02FemaleTrue44000.0MarriedTrueHomeowner with mortgage/loan2.046Wyoming13190.03F
7245299995001501555_01FemaleTrue85000.0MarriedTrueHomeowner with mortgage/loan2.032Wyoming13170.05F
7245399996001506841_02FemaleTrue18500.0Never marriedFalseRented1.025Wyoming13110.04F
7245499997001507219_01FemaleNaN20800.0WidowedTrueHomeowner free and clear1.086Wyoming131120.06F
7245599998001513103_01MaleTrue75000.0MarriedTrueHomeowner with mortgage/loan2.050Wyoming13190.03F
7245699999001519624_01FemaleTrue22200.0Divorced/SeparatedFalseHomeowner free and clear1.061Wyoming13150.06F
72457100000001520877_01MaleTrue16400.0Never marriedTrueNaNNaN31Wyoming131NaN5NaN